Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: connellybarnes Date: May 6, 2004 10:22am
Forum: web Subject: Number of sites archived per year


Can I get the statistics on the total number of pages archived by year?

I intend to use this to measure public interest in topics. For example, I can type "Zoology", and find the number of pages mentioning Zoology in 2001. However, there is no "total pages in the archive in 2001" figure to compare this against. So it is not currently possible to measure public interest in e.g. Zoology using IA.

I tried searching Recall with common English words like "Although". However, this doesn't work, because Recall won't give the # of hits for common words.

Any helpful feedback would be appreciated.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffmolly Date: May 8, 2004 3:22am
Forum: web Subject: Re: Number of sites archived per year

It is also important to note that Recall is still in beta, and is only searching about 1/3 of our web collection. So your numbers for your "Zoology" search aren't accurate for all of 2001 in all of our web collection.

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: May 7, 2004 4:23am
Forum: web Subject: Re: Number of sites archived per year

Some time ago I have done an analysis on total number of documents (including duplicates) in our Web collection up to 2001.

Total documents per year:
1996 35573795
1997 352255060
1998 276999863
1999 1080306733
2000 4219167381
2001 7024238537

Total html pages per year:
1996 20475783
1997 214802568
1998 195422208
1999 757339164
2000 2184768312
2001 4452127202

These figures should not be used in any scientific research by any means. These numbers are pretty accurate but the purpose of this exercise was to better understand the amount and nature of our Web collection at the time.

In general, Internet Archive does not have enough staff to conduct these types of data analyses. Also, we see our selves more as a library that provides access to knowledge but does not necessary conducts studies on it. Besides Wayback Machine and Recall access to the Web collection, we are working on redesigning our researcher access. IA researcher access allows researchers to get an account on our cluster to do their own research on our Web collection. Unfortunately, due the work in progress we will not be able to process any new researcher requests within next 3 months or so.

For updates on researcher access please check http://www.archive.org/web/researcher/researcher.php

Let me know if I can help more.
i.